Pose Compensation for Bimodal Speech Recognition

نویسندگان

  • Jianbo Ma
  • Chalapathy Neti
  • Andrew W. Senior
چکیده

Lip reading has been proven to improve speech recognition accuracy in adverse environments. Most existing lip reading systems have frontal pose assumption, which makes it very difficult to use in tasks such as video transcription (speech recognition of the audio stream for video indexing and retrieval). In this paper, we propose a new method to compensate the lip pose change by exploiting the general symmetry of human face. From the imaging geometry we show that a frontal lip can be recovered from only one profile view. The resulting pose compensation method has the following advantages: (1) it only requires one profile image; (2) it does not need any 3D model; (3) it does not need an accurate lip shape contour. Experimental results are given to show the effectiveness of our method.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...

متن کامل

Bimodal speech recognition using coupled hidden Markov models

In this paper we present a bimodal speech recognition system in which the audio and visual modalities are modeled and integrated using coupled hidden Markov models (CHMMs). CHMMs are probabilistic inference graphs that have hidden Markov models as sub-graphs. Chains in the corresponding inference graph are coupled through matrices of conditional probabilities modeling temporal influences betwee...

متن کامل

روشی جدید در بازشناسی مقاوم گفتار مبتنی بر دادگان مفقود با استفاده از شبکه عصبی دوسویه

Performance of speech recognition systems is greatly reduced when speech corrupted by noise. One common method for robust speech recognition systems is missing feature methods. In this way, the components in time - frequency representation of signal (Spectrogram) that present low signal to noise ratio (SNR), are tagged as missing and deleted then replaced by remained components and statistical ...

متن کامل

CENSREC-AV: evaluation frameworks for audio-visual speech recognition

This paper introduces incoming evaluation frameworks for bimodal speech recognition in noisy conditions and real environments. In order to develop a robust speech recognition in noisy environments, bimodal speech recognition which uses acoustic and visual information has been paid attention to particularly for this decade. As a lot of methods and techniques for bimodal speech recognition have b...

متن کامل

Improved Bimodal Speech Recognition Study Based on Product Hidden Markov Model

Recent years have been higher demands for automatic speech recognition (ASR) systems that are able to operate robustly in an acoustically noisy environment. This paper proposes an improved product hidden markov model (HMM) used for bimodal speech recognition. A two-dimensional training model is built based on dependently trained audio-HMM and visual-HMM, reflecting the asynchronous characterist...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999